Skip to content

Add VLLMSoftEntailer for LLM-based conditional probability estimation#2

Merged
zipJiang merged 3 commits intomainfrom
claude/add-vllm-entailer-08cox
Feb 25, 2026
Merged

Add VLLMSoftEntailer for LLM-based conditional probability estimation#2
zipJiang merged 3 commits intomainfrom
claude/add-vllm-entailer-08cox

Conversation

@zipJiang
Copy link
Owner

Implement a new entailer class that uses a VLLM-backed OpenAI-compatible
server endpoint hosting Zhengping/conditional-probability-regression.
The entailer estimates p(h|p) by extracting the first-token distribution
over special <|label_level_N|> tokens and computing a softmax-weighted
average of their midpoint scores, producing a probability in [0, 1].

https://claude.ai/code/session_018eo6tgjgbqwGcoaaf45K2L

Implement a new entailer class that uses a VLLM-backed OpenAI-compatible
server endpoint hosting Zhengping/conditional-probability-regression.
The entailer estimates p(h|p) by extracting the first-token distribution
over special <|label_level_N|> tokens and computing a softmax-weighted
average of their midpoint scores, producing a probability in [0, 1].

https://claude.ai/code/session_018eo6tgjgbqwGcoaaf45K2L
Replace urllib-based HTTP handling with the official openai Python client
for cleaner, typed interaction with the vLLM OpenAI-compatible server.
Add openai as a project dependency in pyproject.toml and requirements.txt.

https://claude.ai/code/session_018eo6tgjgbqwGcoaaf45K2L
Replace sequential sync OpenAI calls with concurrent async requests
using AsyncOpenAI and asyncio.gather. The client is created per-batch
inside an async context manager to avoid stale event-loop bindings.

Built-in exponential-backoff retries (configurable via max_retries)
handle transient failures (connection errors, HTTP 429/5xx). A
per-request timeout is also configurable.

https://claude.ai/code/session_018eo6tgjgbqwGcoaaf45K2L
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a04cb033e2

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

errors (connection failures, HTTP 429 / >=500) with
exponential backoff.
"""
return asyncio.run(self._async_call_batch(instances))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid asyncio.run in synchronous batch scorer

_call_batch unconditionally wraps every batch in asyncio.run, which raises RuntimeError whenever the caller already has an active event loop (e.g., Jupyter notebooks, pytest-asyncio, FastAPI workers). In those common environments this new entailer cannot be used at all, so experiments that switch to VLLMSoftEntailer will fail before scoring; this path is reached through the normal Entailer.__call__ flow, not just a special API.

Useful? React with 👍 / 👎.

@zipJiang zipJiang merged commit d049bf3 into main Feb 25, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants